How to Convert Delta Parquet Files to a Single Parquet File with Latest Version of Delta |
您所在的位置:网站首页 › spark version › How to Convert Delta Parquet Files to a Single Parquet File with Latest Version of Delta |
Hello @Richards, Sam (DG-STL-HQ), Welcome to the MS Q&A platform. To convert Delta Parquet files to a single Parquet file with the latest version of Delta, you can use Apache Spark and Delta Lake. Load the Delta Parquet files into a Spark DataFramedf = spark.read.format("delta").load(delta_table_path) df.show() Get the latest version of the Delta table:delta_table = DeltaTable.forPath(spark, delta_table_path) df = delta_table.toDF() df.show() Filter the DataFrame to include only the latest version:df = df.filter("version = (SELECT max(version) from delta_table_path)") df.show() Write out the DataFrame as a single Parquet file:df.write.parquet("parquet.delta_table_path", mode="overwrite") If you have the plain parquet files(not using delta lake format), then you can use the below Apache spark python script to convert the plain parquet files in the folder to a single delta lake format. %%pyspark from delta.tables import DeltaTable deltaTable = DeltaTable.convertToDelta(spark, "parquet.delta_table_path")Reference documents: https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/spark/apache-spark-delta-lake-overview.md https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/sql/query-delta-lake-format.md I hope this helps. Please let us know if you have any further questions. |
CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3 |